(WIP) EncodingManager refactoring #1112

karussell · 2017-07-13T12:58:38Z

This is a first sketch of a new approach to improve the flexibility for the edge property storage and it fixes #472, #728, #978 and probably some more issues.

For the moment just look at the nice FastestCarWeighting and how we fetch the values in the constructor that are necessary in the calcWeight method, really easy and understandable:

double speed = edgeState.get(averageSpeed);

These values are composed upfront and are responsible just for a value (and optionally a reverse value), have a look into OSMReaderTest.testEncodedValueBasedEncodingManager for a complete test parsing an OSM file.

So all in all I was able to remove the necessity of the FlagEncoder classes. So just EncodingManager and a list of EncodedValue is necessary. And the best: this already works within the existing system and the new EncodingManager is compatible with the old.

I've created a EncodedValueFactory to create some default EncodedValues like maxspeed, surface, 'car access' etc so that useful Weightings can be created.

The main changes in this PR:

long flags was replaced with a IntsRef that is not limited to 64bits (later on we could use this for all edge properties like distance and pointers, also there is a minor advantage that we could point directly into the storage backing array instead of copying but this is for a later PR)
new directed EncodedValue that are able to store forward and backward values, no need for expensive reverseFlags method, much simpler usage in Weighting implementations, no need for flag encoders but still backward compatible to them

main goals of this PR:

edge storage should allow more than 64bits per edge
easier storing of edge properties -> EncodedValue methods
EncodingManager will be EncodedValues based
avoid reverseFlags

non-goals:

remove FlagEncoders
fix PathExtract
flexible storage for relation and node
solve problem with two bits for access restriction versus speed==0

Further work items:

rename MappedXY to more common DictionaryXY?
use new EncodedValue classes in old FlagEncoder classes: DataFlagEncoder, CarFlagEncoder, ...
merge master into this branch
what to do with the unfavoredEdge used only in QueryGraph -> do we need a 0-bit EncodedValue used just as a 'marker'?
use two EncodedValues (weight and access) for shortcuts in CHGraphImpl -> advantage would be that we would be able to store weight with higher precision
version of every EncodedValue to make loading graph more secure?
avoid Weighting.getFlagEncoder
replace AbstractFlagEncoder and FlagEncoder with something less ugly, separate turn and node flags (Profile == collection of EncodedValue or ComposedEncodedValues)
maybe rename all XY08 into XYOld or XY09 instead? Add a deprecation notice and removal with 1.x
remove all the TODOs and ugly hacks
copyProperties can be done via array copy and setting just the node indices afterwards
define TagParser order
OSMReader is currently doing duplicate work to calculate bits of one OSM way, instead it should calculate the bits once and just array copy to the other edges using edge.get&setData?
InstructionAnnotation getAnnotation -> remove this or replace by a ComposedEncodedValue that can warn about private, tool, ferry or ford access?
separate TurnWeighting handling -> how to do storage of OSM-node and OSM-relation information?
~~merge new EncodingManager with old~~
~~try a real world example and parse OSM with a car profile and query via the new weighting mechanism~~
~~use IntRefs instead of int flags~~
a directed DoubleProperty and BooleanProperty class that holds two values (one value for a direction) that can be easily 'reversed' -> ComposedEncodedValue holds multiple encoded values? support get and getReversed directly for every encodedValue?
~~should we introduce a PropertyFilter so that a maxspeed is not parsed and calculated if the highway tag is not accepted ('rail')?~~
~~move out EncodedValue.parse into something else to make support of non-OSM easier~~

Later:

allow cross-integer space for EncodedValue otherwise we can get bad constellations e.g. if EncodedValue has two directions and requires e.g. 17 bits
we could introduce a "reverse" EdgeExplorer or iter.setBaseNode(node, reverse=true) and then avoid the reverse parameter in Weighting.calcWeight as well as the if clauses to fetch e.g. the forward or the reverse speed. I.e. one edge is directed API-wise and undirected storage-wise. We should do this undirected vs. directed stuff in a later step, see also (WIP) EncodingManager refactoring #1112 (comment)
store distance as an EncodedValue
But in general we need length dependent mechanism somehow
reimplement the CarFlagEncoder as a ComposedEncodedValue?
PathExtract should be more separated (call extract explicitly after calcPath and potentially rename calcPath to calcShortestPathTree)
solve problem with two bits for access restriction versus speed==0; what to do if a reverse read-out of a oneway edge: exception vs. silently ok? how would we do the exception - via a new setting "read of default value results in exception" or somehow associating the access-EncodedValue with e.g. the max_speed-EncValue?

…asses the old names

…actEncodedValue, added MappedDecimalEncodedValue, separated parsing from EncodedValue, introduced TagParserFactory

boldtrn · 2017-07-21T15:58:33Z

core/src/main/java/com/graphhopper/routing/PathExtract.java

+                    return encoder.isBool(edge.getFlags(), FlagEncoder.K_ROUNDABOUT);
+                }
+            };
+        } catch (Exception ex) {


I am struggling with this a bit. Why do we need the try/catch, especially with a generic exception? Has it something to do if there is a encoder stored in the weighting?

Wouldn't it be better to be able to read the state without creating an exception?

Do not care about it this much. It will&must disappear. Just wanted to make this backward compatible and working

boldtrn · 2017-07-21T16:07:22Z

core/src/main/java/com/graphhopper/routing/lm/LandmarkSuggestion.java

@@ -40,7 +40,7 @@ public BBox getBox() {
     * to specify an explicit bounding box. TODO: support GeoJSON instead.
     */
    public static final LandmarkSuggestion readLandmarks(String file, LocationIndex locationIndex) throws IOException {
-        // landmarks should be suited for all vehicles
+        // landmarks should be suited for all profiles


What about PT :)?

You mean public transit? In its current form A* should work and so also landmarks but not sure if it works nor if it makes it faster

Yes, I was just wondering as it's stating "suited" meaning, it's a good idea to combine PT with LM. I am not sure if this is true, so I was asking.

boldtrn · 2017-07-21T16:08:41Z

core/src/main/java/com/graphhopper/routing/profiles/BitEncodedValue.java

+/**
+ * This class provides easy access to just one bit.
+ */
+public final class BitEncodedValue extends IntEncodedValue {


Nice, GH missed a Boolean Encoded Value, not sure if we should call it like this instead of Bit?

I made the difference here because boolean implies 1 byte and double implies 8 bytes. For Integer I just didn't find a better word yet.

Ah I meant if we should call it BooleanEncodedValue instead of bit?

A boolean usually uses 1byte (roughly) in the JVM, so I wanted to differentiate here as we really use just 1 bit, but not that important and we can change to Bool or Boolean

boldtrn · 2017-07-21T16:13:05Z

Wow, this change looks impressive! Thanks!

I haven't found the time to give this a full in-depth review yet, I only touched the tip of the iceberg so far. My main concern until now is the "ugliness" that you asked to ignore. I would prefer a way to get the state without provoking an exception.

I think using exceptions as a "if replacement" should be avoided if possible. If not then it is what it is :).

karussell · 2017-07-21T16:50:45Z

I think using exceptions as a "if replacement" should be avoided if possible.

Yes, of course. But if statements wouldn't make it more beautiful. We need a different solution, but I'll improve here only if I can solve the followings things before: 0. tuning everything a bit more 1. having a better API for some places (e.g. simple set&get using and 'reverse' stuff) and 2. trying to import a real data set to see if this is actually fast enough and we can model something more realistic.

boldtrn · 2017-07-22T00:09:57Z

core/src/main/java/com/graphhopper/routing/profiles/EncodingManager.java

+        this.extendedDataSize = Math.min(1, extendedDataSize / 4) * 4;
+    }
+
+    public EncodingManager add(TagParser parser) {


should be only allowed when not initialized?

boldtrn · 2017-07-22T00:12:38Z

core/src/main/java/com/graphhopper/routing/profiles/EncodingManager.java

+        return extendedDataSize;
+    }
+
+    // TODO should we add convenient getters like getStringProperty etc?


Would make it more convenient IMHO.

boldtrn · 2017-07-22T00:16:33Z

core/src/main/java/com/graphhopper/routing/profiles/IntEncodedValue.java

+     *
+     * @return the storable format that can be read via fromStorageFormatToInt
+     */
+    public final int toStorageFormat(boolean reverse, int flags, int value) {


Why are flags int and not long?

Currently it is an array of ints as the underlying RAMDataAccess is an array of ints and so this is the most efficient. But I'm thinking of using a byte array instead, which would be required if the GH node or way IDs exceed the 2^32 integer boundaries. The best would be if we could encapsulate this a bit with something like the IntRefs class and pass this in this method, which would also avoid the need of the public getOffset method.

boldtrn · 2017-07-22T00:17:41Z

core/src/main/java/com/graphhopper/routing/profiles/IntEncodedValue.java

+        return uncheckToStorageFormat(reverse, flags, value);
+    }
+
+    final int uncheckToStorageFormat(boolean reverse, int flags, int value) {


How about unchecked?

boldtrn · 2017-07-22T00:19:00Z

core/src/main/java/com/graphhopper/routing/profiles/MappedDecimalEncodedValue.java

+    private final double precision;
+
+    /**
+     * TODO should we really use precision here or use something like the already used 'factor'?


Mhm, precision is the commonly used word for Double, so this does not sound wrong to me.

boldtrn · 2017-07-22T00:21:56Z

core/src/main/java/com/graphhopper/routing/profiles/MappedDecimalEncodedValue.java

+        toStorageMap = new IntIntHashMap(values.size());
+
+        int index = 0;
+        for (double val : values) {


Not that it would change a lot, but should we sort the list of Doubles first? This would also allow us to accept Collections in general. On the other hand the current List approach might be a bit more robust against adding new elements at the end if no additional bit is required (not sure if this is actually a use case).

Ah BTW, I think we should also check if any double occurs twice.

Duplicates checking makes sense. Regarding the sorting: using Collection instead of a List is already possible now.

boldtrn · 2017-07-22T00:24:20Z

core/src/main/java/com/graphhopper/routing/profiles/StringEncodedValue.java

+        super(name, (int) Long.highestOneBit(values.size()));
+
+        // we want to use binarySearch so we need to sort the list
+        // TODO should we simply use a separate Map<String, Int>?


I would think Map might be faster for larger collections and maybe even easier to use?

I bet a map is only faster for relative big collections (>30 entries) ... we should benchmark to make this argument ;). "Easier to use" is not an argument here as we can hide it in this class IMO.

boldtrn · 2017-07-22T00:26:38Z

core/src/main/java/com/graphhopper/routing/profiles/TagParser.java

+    String getName();
+
+    // TODO Every tag parser has an EncodedValue associated but except for convenient usage in EncodingManager we currently do not need this method
+    EncodedValue getEncodedValue();


If it doesn't hurt but makes the code easier 👍

boldtrn · 2017-07-22T00:28:21Z

core/src/main/java/com/graphhopper/routing/profiles/TagParserFactory.java

+import com.graphhopper.routing.util.AbstractFlagEncoder;
+import com.graphhopper.util.EdgeIteratorState;
+
+public class TagParserFactory {


Isn't this factory a bit of an overkill? Wouldn't it be easier to put this into classes?

What do you mean with 'put this into classes'? Creating a RoundaboutTagParser and a HighwayTagParser etc? (This would introduce a lot more code/overhead IMO :))

I'm not super happy with it now but for me this factory was just convenient to avoid creating the same parsers in tests again and again.

boldtrn · 2017-07-22T00:37:41Z

We need a different solution, but I'll improve here only if I can solve the followings things before: 0. tuning everything a bit more 1. having a better API for some places (e.g. simple set&get using and 'reverse' stuff) and 2. trying to import a real data set to see if this is actually fast enough and we can model something more realistic.

Sounds good and makes sense. I just saw it when browsing through the code and thought I might add my two cents.

… that IntsRef is a flexible replacement for flags

…ng working

karussell · 2017-07-25T11:22:54Z

I was able to fix a few things (regarding highway filter and oneway storage) so that full Germany is now properly imported (in 6min) and possible to route via car. The following config.properties is necessary

graph.flag_encoders=car
prepare.ch.weightings=no
prepare.min_network_size=0
prepare.min_one_way_network_size=0

and the following parameters are necessary and currently no bidirectional algorithm will work:

vehicle=weighting&weighting=fastest2&algorithm=astar

Such an uni-dir route request from south to nord takes ~6sec. Not too bad. Only a smaller fraction (~25%) of astar.runAlgo seems to be related to the new 'dynamic' FastestCarWeighting which could be probably further reduced via reusing the IntsRef for the next edges.

Somehow the instructions have no street name associated. Will investigate.

The applyWayTags is currently very lightweight and accounts to 0% of CPU time in the profiler.

HendrikLeuschner · 2018-04-02T13:42:49Z

Sure.
These are the different builders: Builders
They are employed via our custom ORSOSMReader which in turn calls the Processingcontext, which is a custom class, where the processWay function calls the builders of all enabled processors.
Currently we have changed the normal OSMReader so that it contains an empty method onProcessWay that is extended in our custom ORSOSMReader. We would choose a similar approach where we employ the calls in the Reader?

karussell · 2018-04-03T23:19:05Z

I do not understand why a new layer (GraphProcessContext) and the changes to the OSMReader is necessary, also the naming is unspecific.

But creating separate classes that (if enabled) add multiple TagParsers to the EncodingManager sounds reasonable. Of course renaming TagParsers to something else is still possible IMO this responsibility sounds very similar to the old FlagEncoder concept except that there is now a need to avoid duplication (e.g. parsing&storing maxspeed is required for car and truck but should be stored only once). And this hierarchical aspect could be employed, so we end up in a class hierarchy instead of one big TagParserFactory class.

HendrikLeuschner · 2018-04-04T09:30:27Z

I do not understand why a new layer (GraphProcessContext) and the changes to the OSMReader is necessary, also the naming is unspecific.

Sorry, I was not clear on this. This is how we do it right now. I will ditch both the Reader and the GraphProcessContext and implement everything in your existing classes + the new classes for separate TagParsers.

karussell · 2018-04-04T13:32:02Z

Ah, no problem. Sounds great, thanks :) !

HendrikLeuschner · 2018-04-10T14:32:59Z

I've rewritten the structure a bit. All TagParsers (though I have not added all yet) now reside in their respective own class: Tagparsers
The needed attributes (like surfaces list) are integrated to ensure good modularity.
The TagParserFactory now does not define profiles anymore but only creates single modules. The overview over enabled parsers and the profile definitions are now inside encodingmanager via a simple list of which parsers should be enabled. The Encodingmanager also checks whether a parser has already been added and has methods to add bikeParsers, footParsers and carParsers. I don't really like putting the profiles in here, I think they should be moved to a different class or configuration file. So far I also do not know how enabling the different profiles should be handled. I only found a usage of addGlobalEncodedValues.
So far I find this structure much easier to understand and modify, but I wanted to get your opinion(s) before I continue further.

karussell · 2018-04-10T16:19:14Z

Yes, this looks like a good way to go forward.

Now ỳou could remove most of the methods in TagParserFactory or maybe the whole class?

The Encodingmanager also checks whether a parser has already been added and has methods to add bikeParsers, footParsers and carParsers

Is such a check necessary if we have the duplicate check for the EncodedValue objects already?

I don't really like putting the profiles in here, I think they should be moved to a different class or configuration file.

Yes, think so too. But let's fix more important stuff first. These lists should be made private or are the needed outside of it?

BTW: in ~2 weeks I'll finally be able to increase priority and push this further faster :)

HendrikLeuschner · 2018-04-11T14:38:03Z

Now ỳou could remove most of the methods in TagParserFactory or maybe the whole class?

Yes, this is now mostly done. Entire thing is of course still a work in progress.

Is such a check necessary if we have the duplicate check for the EncodedValue objects already?

No, you are right.

I fixed some things and added most of the remaining parsers, except for a few where I was not yet sure how to handle it (namely SpatialRuleId). Cleaned up some of the files too. I will continue to work on this.

HendrikLeuschner · 2018-04-19T19:16:27Z

Hi @karussell , are there in principal different encoded values for all profiles, meaning e.g. bike and bike2 have different encodedValues for average_speed and access etc? Including different tag parsers?
Running into this problem while running the tests. One of them fails while trying to find the ev bike2.access for the Bike2WeightFlagEncoder.
If they are shared, there is a need to distinguish the parsers too, possibly without duplicating them.

karussell · 2018-04-20T10:34:25Z

If they are shared, there is a need to distinguish the parsers too, possibly without duplicating them

To make it backward compatible average_speed and access values cannot be shared, yes. E.g. bike2 modifies the speed due to elevation and racing bike is faster, but also access is different as mtb can use paths with a more difficult grade for example. Although I can imagine that making the access shared across bikes should be possible as one can use the Weighting to just avoid these paths (but the space requirement is very low and so this optimization should be done later)

In the future one could further try to calculate average_speed dynamically from other generic values but especially for non-CH use cases a precomputed value for both is faster while query.

HendrikLeuschner · 2018-04-20T17:01:47Z

I have not yet created all EncodedValues for all profiles, e.g. MotorcycleFlagEncoder. Therefore I have more failing tests than before (Without my changes its 142, with them its 216). I don't know when or if I will have the time to do add every remaining profile. Would you like me to stage a pull request anyway? Sorry for all the questions.

karussell · 2018-04-20T22:04:24Z

Therefore I have more failing tests than before (Without my changes its 142, with them its 216).

I'm not sure if I understand this but a PR will make it clear probably.

I don't know when or if I will have the time to do add every remaining profile.

Sure

Would you like me to stage a pull request anyway?

Yes, please. You can add [WIP]

Sorry for all the questions.

No need for that really :)

HendrikLeuschner · 2018-05-09T12:34:25Z

Hi,

any update on this so far?
Best, Hendrik

karussell · 2018-05-15T16:25:50Z

I'm currently in the process of defining what should go in this PR and what not.

Here are the main goals of this PR:

edge storage should allow more than 64bits per edge
easier storing of edge properties -> EncodedValue methods
EncodingManager will be EncodedValues based
avoid reverseFlags

non-goals:

remove FlagEncoders
fix PathExtract
flexible storage for relation and node
solve problem with two bits for access restriction versus speed==0

And so I will remove the AccessValue refactoring and create a separate PR, see this separate branch: https://github.com/graphhopper/graphhopper/tree/access_refactoring

Also I'll remove the DefaultEdgeFilter refactoring as it is now in master.

Furthermore the json refactoring won't happen in this PR: #1291 (comment) and we'll use a string based EncodedValue properties storage somehow.

Now I'll try to merge your PR and rebase the changes against the access_refactoring branch. As a first step after this bigger change I'll try to fix the performance problem (currently we are 2x slower) and in parallel we could work on making more tests green.

HendrikLeuschner · 2018-05-28T17:31:51Z

Currently osm tags are only processed before edges are created via parse(IntsRef ints, ReaderWay way). For some of our information we need a parser that is called after edge creation, such as parse(EdgeIteratorState edge, ReaderWay way).
Do you think we could include something like this in the osm reader after edge creation?

On a different node, the optimal case for us would be to have a possibility of defining TagParsers on the fly even from outside the graphhopper repo. This should be some sort of method like TagParserFactory.add(TagParser p) or equivalent in EncodingManager, so that we can maintain our own parsers. I think this would be optimal for other projects built on graphhopper too.

karussell · 2018-05-28T19:23:32Z

Do you think we could include something like this in the osm reader after edge creation?

What would be a use case? Why couldn't you do a post processing after the graph is created?

On a different node, the optimal case for us would be to have a possibility of defining TagParsers on the fly even from outside the graphhopper repo.

This should be already possible, but instead of TagParserFactory you would add it to the EncodingManager (?)

karussell · 2018-08-09T14:30:22Z

This PR is too big. I'll split it into separate ones:

access_refactoring Access refactoring #1436
move from long to IntsRef and avoid that performance will suffer, maybe fetch ints for the whole edge at once (intsref_refactoring branch)
EncodingManager will be EncodedValues based and we avoid calling reverseFlags for every edge access

karussell · 2018-08-28T22:28:41Z

Closing in favor of #1447

karussell added 5 commits July 7, 2017 17:08

trying out Property approach, tests passing

399e550

trying to adapt OSMReader

614323a

made OSMReaderTest working!

bf829d2

rename EncodingManager, XYProperty and EncodedValue to give recent cl…

f59acd4

…asses the old names

fixed compiler and test issues

2697900

karussell added architecture improvement labels Jul 13, 2017

karussell mentioned this pull request Jul 14, 2017

EncodedDoubleValue allowZero check does not work for small numbers #199

Closed

karussell added 3 commits July 14, 2017 17:40

fixed a few renaming issues, renamed Double to Decimal, deleted Abstr…

c79bfc3

…actEncodedValue, added MappedDecimalEncodedValue, separated parsing from EncodedValue, introduced TagParserFactory

minor improvement to parsing the destination tag, #733

08a4fc7

directed encoded values

6fbbf12

boldtrn reviewed Jul 21, 2017

View reviewed changes

boldtrn reviewed Jul 22, 2017

View reviewed changes

karussell added 2 commits July 22, 2017 13:17

introducing IntsRef for the previous 'int flags'. Now one sees better…

e75866b

… that IntsRef is a flexible replacement for flags

fixed directed EncodedValue and filters to make full import and routi…

8ac1a3a

…ng working

minor comment removal

2ee201c

karussell mentioned this pull request Apr 9, 2018

Allow list of extended storages #276

Closed

karussell mentioned this pull request Apr 14, 2018

Don't merge: How to store a custom attribute, read/write #1338

Closed

karussell mentioned this pull request May 7, 2018

Correct elevation on tunnel and bridge #1363

Closed

karussell mentioned this pull request May 12, 2018

Refactor Path class #1110

Open

boldtrn mentioned this pull request May 14, 2018

barrier blocking although access=yes graphhopper/directions-api#80

Open

This was referenced May 25, 2018

Split off api #1378

Merged

Reduce usages of DefaultEdgeFilter(encoder, true, false) #1380

Closed

karussell added the refactoring label May 29, 2018

karussell mentioned this pull request Aug 21, 2018

[WIP] TagParserFactory, Encoder refactoring #1349

Closed

karussell mentioned this pull request Aug 28, 2018

Edge flags refactoring #1447

Merged

13 tasks

karussell closed this Aug 28, 2018

karussell modified the milestone: 0.12 Feb 19, 2019

(WIP) EncodingManager refactoring #1112

(WIP) EncodingManager refactoring #1112

Conversation

karussell commented Jul 13, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boldtrn commented Jul 21, 2017

karussell commented Jul 21, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boldtrn commented Jul 22, 2017 • edited by karussell

karussell commented Jul 25, 2017

HendrikLeuschner commented Apr 2, 2018

karussell commented Apr 3, 2018

HendrikLeuschner commented Apr 4, 2018 • edited

karussell commented Apr 4, 2018

HendrikLeuschner commented Apr 10, 2018 • edited

karussell commented Apr 10, 2018 • edited

HendrikLeuschner commented Apr 11, 2018 • edited

HendrikLeuschner commented Apr 19, 2018

karussell commented Apr 20, 2018

HendrikLeuschner commented Apr 20, 2018

karussell commented Apr 20, 2018

HendrikLeuschner commented May 9, 2018

karussell commented May 15, 2018 • edited

HendrikLeuschner commented May 28, 2018

karussell commented May 28, 2018

karussell commented Aug 9, 2018 • edited

karussell commented Aug 28, 2018

karussell commented Jul 13, 2017 •

edited

karussell commented Jul 21, 2017 •

edited

boldtrn commented Jul 22, 2017 •

edited by karussell

HendrikLeuschner commented Apr 4, 2018 •

edited

HendrikLeuschner commented Apr 10, 2018 •

edited

karussell commented Apr 10, 2018 •

edited

HendrikLeuschner commented Apr 11, 2018 •

edited

karussell commented May 15, 2018 •

edited

karussell commented Aug 9, 2018 •

edited